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1 Introduction 

The topic of this paper is a hierarchy of information-like functions, here 
named the information correlation functions, where each function of the 
hierarchy may be thought of as the information between the variables it 
depends upon. The information correlation functions are particularly suited 
to the description of the emergence of complex behaviors due to rnany- 
body or many-agent processes. They are particularly well suited to the 
quantification of the decomposition of the information carried among a set 
of variables or agents, and its subsets. In more graphical language, they 
provide the information theoretic basis for understanding the synergistic 
and non-synergistic components of a system, and as such should serve as a 
forceful toolkit for the analysis of the complexity structure of complex many 
agent systems. 

The information correlation functions are the natural generalization to 
an arbitrary number of sets of variables of the sequence starting with the 
entropy function (one set of variables) and the mutual information function 
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(two sets). We start by describing the traditional measures of information 
(entropy) and mutual information. 

Then the between operator is introduced and the algebra of the between 
operator is discussed. Then the information correlation function hierarchy is 
defined using the between operator, and we show that it has the properties 
desired of the information due to a set of random variables, but not due 
to any proper subset of those variables, facilitating the interpretation of it 
being the information between the named variables. 

We end with a graphical exploration of the ideas presented here using 
the Ising model. The full Ising system mathematics is not disussed in detail 
in this brief introduction, though it may be found in the author’s Ph.D. 
dissertation [6]. For the Ising model we take subsets of sites in the Ising 
system, and construct the information correlation functions for those sites. 
The information correlation functions are found to describe well the regions 
of the phase space of the Ising system where correlations occur that must be 
described by high order distribution functions (“ complex regions”), and the 
regions where low order reductions of the distribution functions are sufficient 
for description (“ simple regions”). 

In [6] the development of the information correlation functions proceeded 
thru an analysis of the cumulant hierarchy. The current discussion is far 
simpler, while still capturing many of the subtleties. 

The information correlation function hierarchy has been known for some 
time in different guises, having been noted by the fluid theorists, see for 
example [1], but it appears to not have been interpreted as an informa- 
tion quantity measuring the information between a set of random variables 
before, nor has it apparently been applied to infer regions of complexity in 
systems before. These are the contributions of current work presented here. 

2 The information in one variable and the infor- 
mation between two variables 

Given a random variable A , the Shannon entropy of A measures the uncer- 
tainty of the outcome of A before A is seen, and it measures the amount of 
information in A after A is seen. The Shannon entropy of A is given by 

S(A):=-^2p(a)l°g(p(a)) (1) 

a 

where p(a) is the probability that symbol a € A appears. (We confine the 
discussion to the discrete case in this paper - the continuous case requires 
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only a brief extension. Also, all of the results of this paper hold when the 
logarithm is taken with any base.) 

The mutual information [4] of, or the information between, two random 
variables A and B, is given by 

<2) 

where the sum is over the mutually exclusive outcomes of the pair of random 
variables ( A , B), and where p(a, b ) is the probability that symbol (a, 6) £ Ax 
B appears. Examine equation 2 for a moment. When the joint probability 
distribution p(a, b) describes two independent variables it necessarily factors 
as p(a, b) = p(a)p(6), leading to the logarithm, and therefore the mutual 
information, being zero M(A, B) = 0. On the other hand, if both of the 
processes A, B are identical, then p(a, b) = p(a) — p(b) when a = 6, and 
p(a , b) = 0 otherwise. Then for this case M(A, B) = — 52 a P( a ) l°9(p( a )) ^ 
— YlbP(b) log(p(b )), which for this case is seen to be the Shannon entropy of 
either random variable. (Zero joint probability cases make a contribution 
of zero to the mutual information since lim^o xlog(x) = 0). The mutual 
information may also be seen as the amount that the uncertainty in A is 
reduced when B is seen: 

M (A, B) = S(A) - S(A | B) (3) 

where the second quantity is read as the “uncertainty of A given B” , and is 
given by 

S(A\B) :=-J2p(a,b) log(p(a | 6)) (4) 

a, 6 

where p(a | b) is the probability that a 6 A appears given that b € B is 
seen, p(a | b) := p(a, b)jp{b) (and p{b) ^0). It is straightforward to show 
from this that the maximum mutual information possible when A is fixed is 
M(A, B) = S(A ): the minimal uncertainty that A can have after B is seen 
(the minimal 5(A | B)) is zero (which occurs when B is taken to be A, for 
example). Note that the mutual information may be written also as 

M(A, B) = S(B) - S(B | A) (5) 

and symmetrically as 

M(A, B) = S(A) + S(B ) - S(A, B). (6) 

In the next sections we will develop the generalized mutual information. 
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3 The information between several variables 


In the previous section the mutual information was discussed in the context 
of measuring the information between two sets of variables. What properties 
are desired of the “information between variables” for cases when more than 
two variables are involved, which we identify as the “generalized mutual 
information”? 

Consider three variables, A, 5, and C. There are immediately the three 
quantities S(A), S(B), and S(C) which quantify the uncertainties or infor- 
mations in single variables, but say little about information between any 
of them. There are also the three mutual informations M(A, 5), M(B , (7), 
and M(C, A) which together quantify the information between all pairs of 
two variables. What, though, is the information between all three variables ? 
Could it be the quantity S(A,B) — S(A, B \C)1 No, because this is quantity 
is not permutation symmetric, and we expect the information between a set 
of random variables to be unchanged by their ordering. Thus symmetry is 
an important property of the desired quantity. 

Consider the notion of the information between a set of variables, that 
information due only to the whole set and not to any proper subset of the 
set. We might expect that this information, in the case where one variable 
of the set is independent of all others, is zero because there is a natural 
decomposition of the information that the variables provide into the infor- 
mations from the independent subsets. We see that the quantity above also 
fails to have the property that it generally is zero when any of the vari- 
ables is independent of the others. Generalizing to cases where there is a 
natural decomposition of the set into independent subsets, we have another 
important desired property of the information between a set of variables: 
The information between any set of variables should be zero whenever any 
proper subset of variables from the set is independent of its complement in 
the set. We call this property subset decomposition. 

Taking a heuristic approach, we might pretend as if a between process 
existed and take the information between three variables to be the informa- 
tion in the between process of the between process of the first two variables, 
and the third variable. For example, imagine a channel between A and B 
through which passes only the necessary information in A which could be 
used to reduce the uncertainty of B. This channel is then a representation 
of the information between A and B. Call the process in this channel ADB. 
Then the information between all three variables is given by (A Pi B) H C. 
If the between operation is to be meaningful in this sense, one might ex- 
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pect that this reduction be associative, that is the reduction above should 
be equivalnt to the reduction A 0 (B fl C). This leads to associativity as 
another desired property. 

Similarly, commutativity is clearly a desired property, that is A fl B = 
BCi A is desired since the location of the channels is irrelevant in the present 
discussion. 

In the next section we define an algebra for the between operator , and 
demonstrate that the desired properties are satisfied. The desired properties 
for the information between variables are symmetry , associativity , commu- 
tativity , and subset decomposition. We will assume known properties of 
the entropy, the entropy of joint processes, and one other assumption, the 
distributive property, and find that symmetry, associativity, commutativity 
are reuired for consistency, while subset decomposition follows as an implicit 
property. 

In later sections we define the information correlation function hierarchy 
and lend to each member of the hierarchy the interpretation of being the 
information between the named variables, the generalized mutual informa- 
tion. 

We end the paper with a concrete example using the Ising model, com- 
puting the entropies of subsets of sites in the Ising spin system, and con- 
structing the information correlation functions for those sites. The mutual 
information of a sequence of spin sites has a particularly simple form which 
aids in understanding the structure of the information correlation function 
hierarchy. We note that the information correlation functions serve as quan- 
titative measures in describing the hierarchy of subsystem complexities of a 
system. 

4 The algebra of the between operator (fl) 

The information between two random variables X\ and X 2 is the mutual 
information of those variables. We motivate the definition of the between 
operator by writing the mutual information as the entropy of the between 
process 

M(X 1 ,X 2 ) = S(X 1 nX 2 ) (7) 

We do not define the between process, nor do we ever need to define this 
process, since at this and any point equation 7 may be viewed as a purely 
motivational statement, and since the definitional statements to follow may 
be taken without reference to any between process. Here and as previously 
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stated, the notion of the between process is a convenient heuristic in discus- 
sion. Define the joint process operator of two random variables as 

Xi\JX 2 :={XuX 2 } (8) 

where we note that the joint process operator U is intrinsically associative 
and commutative. Equation 8 leads, using equations 2 and 7, to the equation 
we take as the definition of the entropy of the between operator of two 
processes, 

s(x 1 n x 2 ) ~ S(X i) + s(x 2 ) - S{X 1 u x 2 ) (9) 

Because the joint process operator U is commutative, and because addi- 
tion is commutative, consistency requires that the between operator fl is 
commutative, 


X\ n X 2 := x 2 nx 1 . (10) 

We assume one more property, distributivity, 

‘(Xinx 2 )u*3 := (x l ux 3 )n(x 2 ux 3 ) 

(x l ux 2 )nx 3 := (x l nx 3 )u{x 2 nx 3 ). (11) 

Assuming consistency of equations 9, 10 and 11, the between process is 
required to be associative. Thus 

(iiru 2 )nx 3 := x 1 n(x 2 nx 3 ) (12) 

It is clear at this point that the fl and U operators form a boolean alge- 
bra equivalent to set intersection and union, respectively. As things stand 
now, because of this algebra and equation 9 we are always able to reduce 
the entropy of an expression involving the between operator to a sum of 
entropies of expressions involving only joint processes. Since we understand 
the entropy of joint processes without further clarification, the mathematics 
of the entropy of expressions involving between processes is consistent. With 
the understanding that values of entropy expressions involving the between 
operator are to be determined by their equivalent expressions involving only 
the joint operator, all values of entropies of expressions involving between 
operators are well-defined. 

The between operator has been successfully defined. The degree to which 
the between process may be defined therefore has significant impact on the 
degree to which the interpretation of the entropies of an expression involving 
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the between operator may be interpreted as an entropy of a real process. 
Since we do not construct between processes in this paper, this question of 
interpretation remains open, and we only point to the fact that the infor- 
mation correlation functions have all of the desired properties desired of the 
information between, or generalized mutual information of, a set of vari- 
ables, except for one so-far unmentioned property - positivity. In fact, it is 
not difficult to show that there exist joint processes of several variable for 
which the entropy of the metaphoric between discrete process is negative. 
Thus it is ill advised to attempt to construct any between process, since the 
entropy of any discrete process is positive. A related interpretation issue 
which remains is this sign issue - from the results to follow the magnitude of 
the information correlation functions is highly indicative of the complexity 
due to the full process and not to any subprocess, however the dominant 
sign of the information correlation functions changes from one order to the 
next. Perhaps there is a rational interpretation in terms of the confusion 
a new member of the group of variables brings to the group: perhaps the 
mathematics is simply telling us that adding a new member to a group to 
bring the group to an odd number of members is always prone to causing 
confusion! 

In the next section we present a different approach to the information 
correlation functions, and demonstrate the subset independence property, 
the only remaining property desired of the information between a set of 
variables, the generalized mutual information. 

5 The information correlation function hierarchy 

Using the algebraic properties of the information between random variables 
given by equations 8-11, we find that the information between any set of 
random variables may be expanded in terms of a sum of informations of 
single and joint processes. We have then the following hierarchy of functions, 
which we call the information correlation functions. 

S{X i) = S{Xi) 

s(x i n x 2 ) = s(x i) + S(x 2 ) - s(Xi u x 2 ) 
s(x i n x 2 n x 3 ) = S(x i ) + S(x 2 ) + s(x 3 ) 

-S(X i u x 2 ) - S(X 2 u x 3 ) - S(X 3 u Xi) 

+S(Jt 1 ux 2 ujr 3 ) (13) 
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In general, the pattern is that the signs are (— where k is the 
number of random variables mentioned as arguments, and all subsets appear 
as arguments exactly once. 

There have been several previous characterizations of the redundancy of a 
set of random variables (see for example [3], all involving a comparison of the 
full distribution of the variables to the product of the marginal distributions 
of the single variables. However, none of these characterizations is able to 
properly sort out the information due to a set of more than two variables 
and not to any proper subset of that set. For more details on this see [6]. 

6 Another look at the information correlation hi- 
erarchy 

Consider the n variable joint process (.Xj ., ... ,X n ) and let pk(h, ■ ■ • , *fc), k < 
n, denote the marginal probability distribution of the k variable joint process 
(X h ,...,X ik ), with its arguments being given lowest index to highest index 
(i m < i n for m < n). Now consider the hierarchy 

log(pi(i)) = 

log(p 2 (i,j)) = <f> 2 (i, j) + 4>\{i) + 4>\(j) 

log(p n (l, . . -,n)) = <£„(l,...,n) + ... + <fe(l,2) 

3) + . . . + fc(l) + . . . + Mn) (14) 

We may always solve equations 14 for the <p t - The result of doing this is 

= log(pi(i)) 

<k(i,j) = log(p 2 (i,j)) ~ log(pi(i)) - log(pi(j)) 

= log(p n (l,...,n)) - log(Pn-l) + 

[»— i] 

... + (-l) n+1 Y2 l °9(pi) (15) 

[i] 

where Y^[k] indicates that the summation is over all subsets of k arguments. 
Now, multiply the right sides of equations 15 by (— l) n and then average over 
p n to find, as we will see immediately after the next equation, the hierarchy 
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of information correlation functions Cj t := ((— 1)*^): 

Ci(i) := -^2pi(*)log(pi(i)) 

C2(iJ) := ^ iJ)lo9 {^k) 

C„(l,...,n) := [-l^gd 

w V n [n — 1] Pn—l J 

(16) 

where fl [m] indicates that the product is over all subsets of m arguments. 
Comparing equations 13 and 16 shows that 


S(x 1 n...r\x n ) = c n (i, . . . , n) ( 17 ) 

so that at this point in the paper we have developed the information corre- 
lation hierarchy in two distinct ways, the first development being with the 
boolean process algebra of section 4. Further, we note that the (f>k are the 
terms in the exponents of a product expansion of p n in terms of subsets of 
it arguments. From equations 15 we easily find 

p n = e*” x JJ e^*" 1 x ... (18) 

[n-l] [1] 

From equation 18 it is clear that if the probability distribution p n factors 
into the product of two probability distributions, p r and p s with n = r 4- s, 
(which occurs iff the sets of variables corresponding to the arguments of 
p r and p s are independent of each other) that there will be no <j>k present 
for k > max(r,s ). Thus, such <f>k , and their corresponding information 
correlation functions, taken in this paper to represent the generalized mutual 
information of, or information between, the arguments taken as a set, are 
zero. This establishes the last desired property, that whenever two such sets 
of independent variables occur, there will be no information between the full 
set of variables. 

In the next sections we develop the entropies of the Ising system, develop 
the information correlation function hierarchy from these entropies, and 
then demonstrate that the information correlation functions are useful in 
determining where in the parameter space of the Ising system “complex” 
behavior (here behavior requiring more of the Ising spins to describe) is 
located, and conversely, where the Ising system behaves “simply”. 
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7 Information correlation functions in the Ising 
System 

The following graphs present the various quantities that might be consid- 
ered in an analysis of a complex system, for the linear (ID) Ising system. 
These quantities are entropies, moments, correlations, cumulants, and the 
information correlation functions. 

Briefly, the Ising system considered here is a one-dimensional chain of 
nearest neighbor coupled spins, taking the values ±1, with energy (Hamil- 
tonian) of the state a (representing the values of the spins) given by 

n 

E ( a ) — ~ + Rsi ]. (19) 

i — 1 

Here, the parameters Q and R are spin-spin coupling strength, and spin- 
external field strength respectively, and the probability of states is pro- 
portional to e~& E ( a ) /Z n (P) with ft = 1/fcT, and T the temperature. The 
normalization constant (partition function) is given by 

ZM = Y,e- ma) ( 20 ) 

a 

so that P(o ) is given by 

e -M°) 

(21) 

The parameter b seen on the graphs is the ft defined above, proportional 
to the inverse temperature. The parameter Q is taken to be one, so that 
positive values of b correspond to the ferromagnetic region, while negative 
values of b correspond to the antiferromagnetic region. 

The mathematics presented in [6] concerns the calculation of the various 
quantities mentioned above in closed form for this system, for any subset of 
the spins. In particular, the graphs shown here range over sets of spins of 
sizes one to four. 

The important thing to note, above all, is that the information corre- 
lation functions are the most diagnostic for complexity structure occurring 
because of the synergistic interaction of the spins. They indicate when set 
of spins contains information not attributable to any subset of the set, the 
synergistic content. They are zero whenever the set of spins may be decom- 
posed into subsets which behave largely independently of one another, and 
nonzero otherwise. 


10 


Graphs of the entropy per spin for orders 1—4 including both the fer- 
romagnetic (3 > 0 and the antiferromagnetic (3 < 0 appear in figures 1- 4. 
Note the sharp difference in the antiferromagnetic case between the first 
and second order entropies, and the marked similarities between the second 
and higher order entropies. Note that the logarithms are base e, and that (3 
appears as b in the axis label. 

Graphs of the information correlation functions of 2, 3 and 4 neighbor- 
ing spins are given in figures 5-7. Note that these functions are always 
largest in the regions of antiferromagnetic bahevior where is is impossible 
to decompose the system into simpler subsystems. 

Graphs of the moments of orders 1 through 4 of neighboring spins ap- 
pear in figures 8-11. As seen in the graphs, the moments are particulary 
disgnostic of transition regions, but not complexity. 

Graphs of the correlation functions of orders 2 through 4 of neighboring 
spins appear in figures 12- 14. 

The graph of the cumulant function of order 4 of neighboring spins ap- 
pears in figure 15 (the first three correlation and cumulant functions are 
equal by order). 


IsingEntl, Q=1 
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Figure 1: First order entropy per spin of the Ising system. Entropy of one 
spin. 
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IsingEnt2, Q=1 
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Figure 2: Second order entropy per spin of the Ising system. Note the 
difference between the first and second order entropies, and the similarities 
of the second-fourth order entropies. The bump on the antiferromagnetic 
phase side occurs as the external field is increased, and indicates a transition 
between the tl • • • states, and the ft * • • state. 


IsingEnt3, Q=1 
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Figure 3: Third order entropy per spin of the Ising system. 
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IsingEnt4, Q=1 
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Figure 4: Fourth order entropy per spin of the Ising system. 



InfCor2, Q=1 



Figure 5: Second order information correlation function of the Ising system. 
Information correlation of two neighboring spins. Note that the information 
correlation functions are similar up to a sign at each order for this system. 
The information correlation functions are the mutual information of the 
first and last spins along the chain in the k spins considered at order k times 

(-D fc 
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InfCor3 , Q=1 
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Figure 6: Third order information correlation function of the Ising system. 
Information correlation of three neighboring spins. 


InfCor4, Q=1 



Figure 7: Fourth order information correlation function of the Ising system. 
Information correlation of four neighboring spins. 
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Figure 8: First order moment - moment of one spin. Note the transitional 
behavior in the region of the increase in the entropy, where the states f | • 

becomes dominated by tf 


IsingMom2 , Q=1 



Figure 9: Second order moment of two neighboring spins. 






IsingCor4, Q=1 



Figure 14: Fourth order correlation of four : 


IsingCum4, Q=1 



Figure 15: Fourth order cumulant of four : 
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